Parsing Universal Dependencies without training
نویسندگان
چکیده
We propose UDP, the first training-free parser for Universal Dependencies (UD). Our algorithm is based on PageRank and a small set of head attachment rules. It features two-step decoding to guarantee that function words are attached as leaf nodes. The parser requires no training, and it is competitive with a delexicalized transfer system. UDP offers a linguistically sound unsupervised alternative to cross-lingual parsing for UD, which can be used as a baseline for such systems. The parser has very few parameters and is distinctly robust to domain change across languages.
منابع مشابه
RACAI's Natural Language Processing pipeline for Universal Dependencies
This paper presents RACAI’s approach, experiments and results at CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. We handle raw text and we cover tokenization, sentence splitting, word segmentation, tagging, lemmatization and parsing. All results are reported under strict training, development and testing conditions, in which the corpora provided for the sha...
متن کاملA rule-based system for cross-lingual parsing of Romance languages with Universal Dependencies
This article describes MetaRomance, a rule-based cross-lingual parser for Romance languages submitted to CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. The system is an almost delexicalized parser which does not need training data to analyze Romance languages. It contains linguistically motivated rules based on PoS-tag patterns. The rules included in MetaR...
متن کاملUniversal Dependencies for Sanskrit
We present the first steps towards a treebank of Sanskrit within the Universal Dependencies framework. Our dataset is tiny at the moment, consisting of less than 200 sentences—a result of a summer internship project. Nevertheless, this seems to be, to the best of our knowledge, the first publicly available piece of syntactically annotated Sanskrit text. We also present a parsing experiment, wit...
متن کاملUniversal Dependencies Parsing for Colloquial Singaporean English
Singlish can be interesting to the ACL community both linguistically as a major creole based on English, and computationally for information extraction and sentiment analysis of regional social media. We investigate dependency parsing of Singlish by constructing a dependency treebank under the Universal Dependencies scheme, and then training a neural network model by integrating English syntact...
متن کاملUDPipe: Trainable Pipeline for Processing CoNLL-U Files Performing Tokenization, Morphological Analysis, POS Tagging and Parsing
Automatic natural language processing of large texts often presents recurring challenges in multiple languages: even for most advanced tasks, the texts are first processed by basic processing steps – from tokenization to parsing. We present an extremely simple-to-use tool consisting of one binary and one model (per language), which performs these tasks for multiple languages without the need fo...
متن کامل